Training LoRA - 基素基

Training LoRA

install sd-webui-additional-networks

https://github.com/bmaltais/kohya_ss

bmaltais/kohya_ss

This repository provides a Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers. The GUI allows you to set the training parameters and generate and run the required CLI commands to train the model.

https://www.youtube.com/watch?v=N4_-fB62Hwk

とりあえずGUIなしでやる基素.icon

sd-scripts/train_network_README-ja.md at main · kohya-ss/sd-scripts · GitHub

prepare

PowerShellいれる

https://learn.microsoft.com/ja-jp/powershell/scripting/whats-new/migrating-from-windows-powershell-51-to-powershell-7?view=powershell-7.3

Installing Git for Windows

【Windows 11対応】Path環境変数を設定／編集して、独自のコマンドを実行可能にする：Tech TIPS - ＠IT

# Windows | Nu のインストール | Nushell

Python Windowsインストール

acccelarate config

accelerate configuration saved at C:\Users\YOU/.cache\huggingface\accelerate\default_config.yaml

https://dev.classmethod.jp/articles/change-venv-python-version/

everytime

$ .\venv\Scripts\activate

Step1. 教師データの用意

学習させたい画像データを用意し、任意のフォルダに入れてください。リサイズ等の事前の準備は必要ありません。ただし学習解像度よりもサイズが小さい画像については、超解像などで品質を保ったまま拡大しておくことをお勧めします。

video:10:00

directory

sd-scripts

train_data //images

finetune //python scripts

Step2. 自動キャプショニング

キャプションを使わずタグだけで学習する場合はスキップしてください。

BLIPによるキャプショニング

video:34:00

$ python finetune\make_captions.py --batch_size 8 ..\train_data

Step3. タグ付け

danbooruタグのタグ付け自体を行わない場合は「キャプションとタグ情報の前処理」に進んでください。

タグ付けはDeepDanbooruまたはWD14Taggerで行います。WD14Taggerのほうが精度が良いようです。WD14Tagger

Do this inside AUTOMATIC1111

Because tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.

directory

\stable-diffusion-webui-docker\data\tiggging-it //images

execute

https://gyazo.com/c2aacef4a3c97968e9d3a6e9e4331b1c

hint: Add Additional tag

.txt will appear in the data directory

Step4 キャプションとタグ情報の前処理

スクリプトから処理しやすいようにキャプションとタグをメタデータとしてひとつのファイルにまとめます。

caption

move .\stable-diffusion-webui-docker\data\tiggging-it\*txt to sd-scripts\train_data

$ python .\finetune\merge_captions_to_metadata.py .\train_data\ meta_cap.json

All captions will be merged

if there is not meta_cap.json, it will be created.

code:meta_cap.json

{

"image1" : {

"caption": "a girl in black hole"

},

"image2": {

"caption": "anime girl with cats"

}

}

tag

$ cd sd-scritpts

$ python .\finetune\merge_dd_tags_to_metadata.py .\train_data --in_json meta_cap.json meta_cap_dd.json

code:meta_cap_dd.json

{

"image1" : {

"caption": "a girl in black hole",

"tags": "1girl, open mouth"

},

"image2": {

"caption": "anime girl with cats",

"tags": "1girl, solo, balck hair"

}

}

cleaning

ここまででメタデータファイルにキャプションと...タグがまとめられています。

ただ自動キャプショニングにしたキャプションは表記ゆれなどがあり微妙（※）ですし、

※たとえばアニメ絵の少女を学習する場合、キャプションにはgirl/girls/woman/womenなどのばらつきがあります。また「anime girl」なども単に「girl」としたほうが適切かもしれません。

タグにはアンダースコアが含まれていたりratingが付いていたりしますので（DeepDanbooruの場合）、エディタの置換機能などを用いてキャプションとタグのクリーニングをしたほうがいいでしょう。

$ python .\finetune\clean_captions_and_tags.py meta_cap_dd.json meta_clean.json

white shirtとshirtみたいな重複タグの削除

woman, female, lady, person ...-> girl

Step5. latentsの事前取得

学習を高速に進めるためあらかじめ画像の潜在表現を取得しディスクに保存しておきます。あわせてbucketing（教師データをアスペクト比に応じて分類する）を行います。

$ python .\finetune\prepare_buckets_latents.py train_data meta_clean.json meta_lat.json model.ckpt --batch_size 4 --max_resolution 512,512 --mixed_precision no

code:powershell

found 19 images.

loading existing metadata: meta_clean.json

load VAE: vae-model.ckpt

100%|████████████████████████████████████████████████████████████████████| 19/19 00:03<00:00, 5.24it/s

bucket 0 (320, 704): 2

bucket 1 (320, 768): 1

bucket 2 (384, 640): 4

bucket 3 (448, 576): 6

bucket 4 (512, 512): 2

bucket 5 (576, 448): 3

bucket 6 (640, 384): 1

mean ar error: 0.054515281612950744

writing metadata: meta_lat.json

done!

教師データフォルダにnumpyのnpz形式でlatentsが保存されます。

train_data folder

16GBなら512,704や512,768

on RTX3090

batch 8 / 512x768 OK

Step6. 学習の実行

code:powershell

accelerate launch --num_cpu_threads_per_process 1 fine_tune.py

--pretrained_model_name_or_path=model.ckpt

--in_json meta_lat.json

--train_data_dir=train_data

--output_dir=fine_tuned

--shuffle_caption

--train_batch_size=1 --learning_rate=5e-6 --max_train_steps=10000

--use_8bit_adam --xformers --gradient_checkpointing

--mixed_precision=bf16

--save_every_n_epochs=4

code:powershell

running training / 学習開始

num examples / サンプル数: 19

num batches per epoch / 1epochのバッチ数: 19

num epochs / epoch数: 527

batch size per device / バッチサイズ: 1

total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 1

gradient ccumulation steps / 勾配を合計するステップ数 = 1

total optimization steps / 学習ステップ数: 10000

batch_size = 1, No --gradient_checkpointing

https://gyazo.com/2c6bb2de9570b760a7300ab0b1bacdcd

1.6s/it

10000step

10000/1.6/3600 =1.7h

few GB/4epoch

few hundred GB/1learn

Step 7. Use Additional Networks

mv .ckpt to \stable-diffusion-webui-docker\data\config\auto\extensions\sd-webui-additional-networks\models\lora

Enable Additional Networks on WebUI

Select Model and set weight

generate and error!

code:powershell

webui-docker-auto-1 | LoRA weight_unet: 1, weight_tenc: 1, model: vae-model(****)

webui-docker-auto-1 | Error verifying pickled file from /stable-diffusion-webui/extensions/sd-webui-additional-networks/models/lora/vae-model.ckpt:

webui-docker-auto-1 | Traceback (most recent call last):

webui-docker-auto-1 | File "/stable-diffusion-webui/modules/safe.py", line 135, in load_with_extra

webui-docker-auto-1 | check_pt(filename, extra_handler)

webui-docker-auto-1 | File "/stable-diffusion-webui/modules/safe.py", line 93, in check_pt

webui-docker-auto-1 | unpickler.load()

webui-docker-auto-1 | File "/stable-diffusion-webui/modules/safe.py", line 62, in find_class

webui-docker-auto-1 | raise Exception(f"global '{module}/{name}' is forbidden")

webui-docker-auto-1 | Exception: global 'torch/BFloat16Storage' is forbidden

webui-docker-auto-1 |

webui-docker-auto-1 |

webui-docker-auto-1 | The file may be malicious, so the program is not going to read it.

webui-docker-auto-1 | You can skip this check with --disable-safe-unpickle commandline argument.

webui-docker-auto-1 |

webui-docker-auto-1 |

webui-docker-auto-1 | Error running process_batch: /stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/additional_networks.py

webui-docker-auto-1 | Traceback (most recent call last):

webui-docker-auto-1 | File "/stable-diffusion-webui/modules/scripts.py", line 395, in process_batch

webui-docker-auto-1 | script.process_batch(p, *script_args, **kwargs)

webui-docker-auto-1 | File "/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/additional_networks.py", line 209, in process_batch

webui-docker-auto-1 | network, info = lora_compvis.create_network_and_apply_compvis(du_state_dict, weight_tenc, weight_unet, text_encoder, unet)

webui-docker-auto-1 | File "/stable-diffusion-webui/extensions/sd-webui-additional-networks/scripts/lora_compvis.py", line 83, in create_network_and_apply_compvis

webui-docker-auto-1 | for key, value in du_state_dict.items():

webui-docker-auto-1 | AttributeError: 'NoneType' object has no attribute 'items'

I chose a wrong way

学習するとき、fine_tune.pyの代わりにtrain_network.pyを指定してください。ほぼすべてのオプション（モデル保存関係を除く）がそのまま使えます。そして「LoRAの学習のためのオプション」にあるようにLoRA関連のオプション（network_dimやnetwork_alphaなど）を追加してください。

なお「latentsの事前取得」は行わなくても動作します。VAEから学習時（またはキャッシュ時）にlatentを取得するため学習速度は遅くなりますが、代わりにcolor_augが使えるようになります。

https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md

but Training LoRA#63f907b2774b17000013a129

it's wirrten in https://github.com/kohya-ss/sd-scripts/blob/main/fine_tune_README_ja.md#学習の実行

changed

Use train_network

higher learning rate

save as sefetansors

learning target is LoRA

higher save epochs(low storage)

remove --gradient_checkpointing (speed up)

code:powershell(NAI)

accelerate launch --num_cpu_threads_per_process 1 train_network.py `

--pretrained_model_name_or_path=model.ckpt `

--in_json meta_lat.json `

--train_data_dir=train_data `

--output_dir=lora_train1 `

--shuffle_caption `

--train_batch_size=1 `

--learning_rate=1e-4 `

--max_train_steps=10000 `

--use_8bit_adam --xformers `

--mixed_precision=bf16 `

--save_every_n_epochs=1000 `

--save_model_as=safetensors `

--network_module=networks.lora

code:powershell

running training / 学習開始

num train images * repeats / 学習画像の数×繰り返し回数: 19

num reg images / 正則化画像の数: 0

num batches per epoch / 1epochのバッチ数: 19

num epochs / epoch数: 527

batch size per device / バッチサイズ: 1

total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 1

gradient accumulation steps / 勾配を合計するステップ数 = 1

total optimization steps / 学習ステップ数: 10000

~1.6 it/s

https://gyazo.com/3b632a6b946da8625519bc8bff1a81c1

more memory is available

Abort and resetting

メモリに余裕がある場合に精度や速度を上げる

まずgradient_checkpointingを外すと速度が上がります。ただし設定できるバッチサイズが減りますので、精度と速度のバランスを見ながら設定してください。

バッチサイズを増やすと速度、精度が上がります。メモリが足りる範囲で、1データ当たりの速度を確認しながら増やしてください（メモリがぎりぎりになるとかえって速度が落ちることがあります）。

batch size=2

code:powershell

running training / 学習開始

num train images * repeats / 学習画像の数×繰り返し回数: 19

num reg images / 正則化画像の数: 0

num batches per epoch / 1epochのバッチ数: 12

num epochs / epoch数: 834

batch size per device / バッチサイズ: 2

total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 2

gradient accumulation steps / 勾配を合計するステップ数 = 1

total optimization steps / 学習ステップ数: 10000

2.36 it/s

1.18h

あやまり。epochごとに待ち時間がある

6-7hかかった

safetensors apperas < 1MB!

https://gyazo.com/a7436da2ad37dbf438bd88c117a302a4

bit higer at beginning

https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#loraの学習のためのオプション

train_network.pyでは--network_moduleオプションに、学習対象のモジュール名を指定します。LoRAに対応するのはnetwork.loraとなりますので、それを指定してください。

なお学習率は通常のDreamBoothやfine tuningよりも高めの、1e-4程度を指定するとよいようです。

XY plot

モデルのリストは選択肢の隣にあるボタンで取得できます。いずれかのモデルを Additional Networks の Model ? で選択しておいてください。そのモデルと同じフォルダにあるモデルの一覧が取得されます。

https://gyazo.com/10609c91c0be280bf8bcbc2798cb61ec